Text-to-speech Synthesis System based on Wavenet
نویسندگان
چکیده
In this project, we focus on building a novel parametric TTS system. Our model is based on WaveNet(Oord et al, 2016), a deep neural network introduced by DeepMind in late 2016 for generating raw audio waveforms. It is fully probabilistic, with the predictive distribution for each audio sample conditioned on all previous samples. The model introduces the idea of convolutional layer into TTS task to better extract valuable information from the input data. Because the results of our system are not satisfactory, the defects and problems in the system are also discussed in this paper.
منابع مشابه
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinio...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملDeep Voice: Real-time Neural Text-to-Speech
We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-tophoneme conversion model, a phoneme duration prediction model, a fundamental frequency pre...
متن کاملHybridnet: a Hybrid Neural Architecture to Speed-up Autoregressive Models
This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation. As an example, we propose a hybrid model that combines an autoregressive network named WaveNet and a conventional LSTM model to address speech synthesis. Instead of generating one sample per time-step, the proposed HybridNet generates multiple samples per time-step by ex...
متن کاملParallel WaveNet: Fast High-Fidelity Speech Synthesis
The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a rea...
متن کامل